Newspaper Headlines Extraction from Microfilm Images
نویسندگان
چکیده
Automatic indexing is important for a digital library to provide digitized manuscripts of old document images and their electronic text. As an essential step in creating such a system, this paper discusses the issue of extracting headlines from old newspaper microfilms. Most research on document layout analysis has largely assumed relatively clean images. However microfilm images of old newspapers present a challenge. Such images are usually insufficiently illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with these kinds of images. A Run Length Smearing Algorithm (RLSA) is applied in the headline extraction. Experiment shows that our approach has improved the recall, precision and combined rates.
منابع مشابه
Automatic Indexing of Newspaper Microfilm Images
This paper describes a proposed document analysis system that aims at automatic indexing of digitized images of old newspaper microfilms. This is done by extracting news headlines from microfilm images. The headlines are then converted to machine readable text by OCR to serve as indices to the respective news articles. A major challenge to us is the poor image quality of the microfilm as most i...
متن کاملPage segmentation and text extraction from gray-scale images in microfilm format
The paper deals with a suitably designed system that is being used to separate textual regions from graphics regions and locate textual data from textured background. We presented a method based on edge detection to automatically locate text in some noise infected grayscale newspaper images with microfilm format. The algorithm first finds the appropriate edges of textual region using Canny edge...
متن کاملEnglish and Persian Sport Newspaper Headlines: A comparative study of linguistic means
Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...
متن کاملEnglish and Persian Sport Newspaper Headlines: A comparative study of linguistic means
Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...
متن کاملText-Line Extraction and Character Recognition of Document Headlines With Graphical Designs Using Complementary Similarity Measure
A method for recognizing characters on graphical designs is proposed. A new projection feature that separates text-line regions from backgrounds, and adaptive thresholding in displacement matching are introduced. Experimental results for newspaper headlines with graphical designs show a recognition rate of 97.7 percent.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002